A context vector model for information retrieval

نویسندگان

  • Holger Billhardt
  • Daniel Borrajo
  • Victor Maojo
چکیده

In the vector space model for information retrieval, term vectors are pair-wise orthogonal, that is, terms are assumed to be independent. It is well known that this assumption is too restrictive. In this article, we present our work on an indexing and retrieval method that, based on the vector space model, incorporates term dependencies and thus obtains semantically richer representations of documents. First, we generate term context vectors based on the cooccurrence of terms in the same documents. These vectors are used to calculate context vectors for documents. We present different techniques for estimating the dependencies among terms. We also define term weights that can be employed in the model. Experimental results on four text collections (MED, CRANFIELD, CISI and CACM) show that the incorporation of term dependencies in the retrieval process performs statistically significantly better than the classical vector space model with IDF weights. We also show that the degree of semantic matching versus direct word matching that performs best varies on the four collections. We conclude that the model performs well for certain types of queries and, generally, for information tasks with high recall requirements. Therefore, we propose the use of the context vector model in combination with other, direct word-matching methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Context-based Information seeking behavior among students of Kharazmi University

Background and Aim: The present study has been done in order to survey contextualized information retrieval behavior by the students of Kharazmi University. Methods: This is descriptive applied research. Statistical population includes all the students currently studying at the Kharazmi University in the time of research. Sample of research includes 196 students selected by convenience sampling...

متن کامل

Can Vector Space Bases Model Context?

Current Information Retrieval models do not directly incorporate context, which is instead managed by means of techniques juxtaposed to indexing or retrieval. In this paper, direct modeling of context is addressed by proposing the full recover of the notion of vector space base, whose role has been understated in the formulation and investigation of the Vector Space Model. A base of a vector sp...

متن کامل

Information Retrieval System Assigning Context to Documents by Relevance Feedback

In this paper we have proposed user feedback driven Information retrieval model. The proposed model assigns weights to the retrieved documents based on its context. The documents are re-ranked based on the user profile and his feedback. Proposed Information retrieval system uses vector space model and expert system. Need for user profile and relevance of information while searching and extracti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JASIST

دوره 53  شماره 

صفحات  -

تاریخ انتشار 2002